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Abstract 

Background: The tomato {Solanum lycopersium L.) is the most widely grown vegetable in the world. It was 
domesticated in Latin America and Italy and Spain are considered secondary centers of diversification. This food 
crop has experienced severe genetic bottlenecks and modern breeding activities have been characterized by trait 
introgression from wild species and divergence in different market classes. 

Results: With the aim to examine patterns of polymorphism, characterize population structure and identify putative 
loci under positive selection, we genotyped 214 tomato accessions (which include cultivated landraces, commercial 
varieties and wild relatives) using a custom-made lllumina SNP-panel. Most of the 175 successfully scored SNP loci 
were found to be polymorphic. Population structure analysis and estimates of genetic differentiation indicated that 
landraces constitute distinct sub-populations. Furthermore, contemporary varieties could be separated in groups 
(processing, fresh and cherry) that are consistent with the recent breeding aimed at market-class specialization. In 
addition, at the 95% confidence level, we identified 30, 34 and 37 loci under positive selection between landraces 
and each of the groups of commercial variety (cherry, processing and fresh market, respectively). Their number and 
genomic locations imply the presence of some extended regions with high genetic variation between landraces 
and contemporary varieties. 

Conclusions: Our work provides knowledge concerning the level and distribution of genetic variation within 
cultivated tomato landraces and increases our understanding of the genetic subdivision of contemporary varieties. 
The data indicate that adaptation and selection have led to a genomic signature in cultivated landraces and that 
the subpopulation structure of contemporary varieties is shaped by directed breeding and largely of recent origin. 
The genomic characterization presented here is an essential step towards a future exploitation of the available 
tomato genetic resources in research and breeding programs. 
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Background 

The cultivated tomato (Solanum lycopersicum L.) was 
probably domesticated in Mexico from wild species that 
originated in the Andean region, although other hypoth- 
eses have been also put forward [1]. In the XVI century 
tomato cultivation, which was already well-developed in 
Central America, was introduced to Europe by Spanish 
Conquistadors. Although initially viewed as a botanical 
curiosity, the tomato was almost immediately introduced 
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into the cuisine of different European regions around 
the Mediterranean basin, starting in Spain and Southern 
Italy [2,3]. The tomato later spread to other continents 
and reached, for instance, North America during the 
time of the European colonization. At the end of the 
XIX th century, the tomato varieties were still open polli- 
nated and seeds from the best plants and/or fruits were 
saved by the farmers every year. Much of the breeding 
effort took place in the XX th century, when clear distinc- 
tions in diverse market classes, such as processing and 
fresh market, were made [4]. 

As most of the edible plants, it is likely that the 
first cultivated tomatoes were directly sampled from wild 
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populations and then improved to obtain a series of 
types amenable to cultivation. Selection for diverse fruit 
shapes is one of the distinctive features of the tomato 
history, along with adaptation to local conditions [1,4]. 
Breeding goals have varied and included yield, reduction 
of production costs, stress resistance, shelf-life and, 
more recently, taste and nutritional value [1]. Breeding 
history is associated with apparently contrasting forces. 
On one hand, tomato suffered different bottlenecks and, 
when compared with the rich reservoir present in its 
wild relatives, the amount of genetic variation of the cul- 
tivated tomato is considered very limited [5]. On the 
other hand, since the last century, breeding has been 
characterized by the introgression of genes for stress re- 
sistance from wild species, which has expanded genetic 
variation [6,7]. The recent tomato genome sequencing 
indicated that several chromosomal segments within 
cultivated varieties are more closely related to S. pimpi- 
nellifolium than to Heinz 1706. The latter carries intro- 
gressions from S. pimpinellifolium, which has also been 
used for the introduction of disease resistance traits, on 
several chromosomes (4, 9 11 and 12) [8]. Tomato 
breeding expanded and fixed differences in specific 
traits. For instance, fruit size, colour and shape pre- 
sent a morphological variety absent in wild species 
[4], although recent selection may have unintention- 
ally diminished fruit quality in exchange for produc- 
tion traits [9]. 

Italy and Spain are considered secondary centres of di- 
versification [1,10,11]. In Italy, a number of tomatoes 
with different fruit shapes have been documented since 
the early days of cultivation [12]. All these types de- 
veloped into landraces, adapted to the cropping prac- 
tices and social background in which they were used 
[1,12,13]. It is believed that over the past decades, the 
cultivated tomato suffered another reduction of diversity 
due to the disappearance of local varieties [14,15]. In 
Italy, despite the good adaptation of landraces to local 
climatic and soil conditions, the advent of highly pro- 
ductive cultivars after WWII resulted in a very signifi- 
cant decline of their cultivation [13]. Considering the 
number of documented names and morphological de- 
scriptions of home-grown tomato types [16], only a frac- 
tion are currently present in local markets [12,17,18]. 
However, cultivated landraces fetch a premium price for 
their superior flavour and consumers' affection [19-21]. 

The analysis of genetic variation in tomato populations 
has initially focused on differences between wild species 
and cultivated varieties. More recently, greater attention 
has been given to the study of the variability present 
within contemporary varieties. In the tomato inferred 
subpopulations are associated to breeding history and 
market classes [6,22,23]. It has also been reported that 
selection for market specialization and for geographic 



adaptation contributes to the population structure of the 
tomato cultivars [14,22]. 

The major goals of current tomato breeders (e.g.: high 
quality fruits) require a good understanding and ma- 
nagement of the diversity within cultivated genetic re- 
sources [24]. Interpreting patterns of genetic variability 
in cultivated landraces of economically important crops 
allows breeders to reconsider this trait-reservoir and, 
eventually, to identify novel alleles or haplotypes to im- 
prove productivity, adaptation, quality and nutritional 
value [25]. To date, much of this germplasm has not 
been extensively characterized and most of the landraces 
have yet to be employed in modern plant breeding [26]. 
Therefore, the study of crop landraces not only provides 
biological knowledge about its history and value, but is 
also essential for biodiversity-based breeding [27]. The 
availability of cost-effective, accurate and fast genoty- 
ping assays has made Single Nucleotide Polymorphism 
(SNP) the most frequently used DNA marker for high- 
throughput analysis of plants, encouraging the analysis 
of sequence variation in germplasm collections. In differ- 
ent plant species molecular data have been used to infer 
the existence of a genetic structure in the collection 
studied, or to assign individuals to genetically differenti- 
ated groups that may be consistent with their ancestry, 
geographical origin, domestication and/or breeding his- 
tory [28-30]. 

In this work we genotyped a wide collection of Italian 
tomato landraces along with contemporary varieties and 
wild species. The main goal was to understand whe- 
ther the human- and environment-driven selection in- 
fluenced the distribution of genetic variation between 
contemporary and traditional accessions, leading to 
the maintenance of a distinct genetic diversity. Fur- 
thermore, by using a Fst outlier approach we identified 
putative loci that can justify the formation of genetically 
differentiated subpopulations. 

Results 

Genetic diversity 

A total of 177 SNP loci, distributed over the twelve 
chromosomes, were used to evaluate genetic diversity in 
214 genotypes (Additional file 1: Table SI). Two SNPs 
(SGNU3 12374-382 and Le004122-27) were removed 
from subsequent analysis because their flanking sequen- 
ces map to two locations of the tomato reference gen- 
ome. The DNA analysis indicated that eleven SNPs were 
monomorphic. In addition, seventeen SNPs were mono- 
morphic among cultivated Solatium lycopersicum geno- 
types. The summary SNP statistics, which also include 
Gene Diversity, Heterozygosity and Polymorphic Infor- 
mation Content (PIC), are presented in Additional file 2: 
Table S2 for all genotypes and Additional file 3: Table S3 
for the S. lycopersicum varieties and accessions. Allele 
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counts and related frequencies varied among loci 
(Additional file 4: Table S4) and almost half of the 
polymorphic loci (49%) presented a major allele fre- 
quency higher than 90%. The calculation of the allele 
frequency for the four predefined S. lycopersicum sub- 
populations (landraces, processing, fresh-market and 
cherry) allowed the identification of private alleles (i.e.: 
those occurring in only one population in pairwise com- 
parisons) (Additional file 5: Table S5). Figure 1A shows a 
Venn diagram indicating the number of alleles that are ex- 
clusive to the various group combinations. Overall, the 
market class cherry presented the highest number of pri- 
vate alleles, while cultivated landraces possess only one 
private allele when compared to the fresh market varieties. 
The number of alleles that are absent in each of the pre- 
defined tomato group was higher for landraces (62), fol- 
lowed by fresh (42), processing (29) and cherry cultivars 
(8). The number of minor alleles per group (i.e.: those with 
a frequency lower than 0.05) is presented in Figure IB. 



Table 1 reports the average allelic richness and the 
average number of alleles. The allelic richness, for both 
coding and non-coding SNPs, was higher for cherry to- 
matoes and lower for landraces. Moreover, the highest 
average number of alleles per locus was found for the 
non-coding SNPs in the landraces, while for the three 
market classes of commercial cultivars, there was a 
slightly higher allelic richness for coding SNPs. The ana- 
lysis of the inter-groups allelic richness per locus showed 
low yet statistically significant differences for all but the 
fresh-processing comparison, corroborating the presence 
of group-specific differences in the frequency of the ana- 
lyzed SNPs (Additional file 6: Table S6). 

Population structure 

We investigated the possible population structure with- 
out introducing any a priori classification. The identifi- 
cation of genetically homogeneous groups of plants was 
performed using an admixture model-based clustering 
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Figure 1 Allelic distribution in the cultivated S. lycopersicum groups. A: Distribution of private alleles in the predefined 5. lycopersicum 
groups. The Venn diagram illustrates the number of alleles that are exclusive to the various combinations among the four pre-defined groups of 
cultivated tomatoes. B: Number of rare alleles in the pre-defined groups of cultivated tomatoes. For each bar, the height of the colored 
segments represent the number of alleles that are absent (blue), with a frequency lower than 0.01 (red) or lower than 0.05 (green). See Additional 
file 5: Table S5 for the name of the alleles. 
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Table 1 Average allelic richness (± standard deviation) and allele per locus in the predefined 
S. lycopersicum subpopulations 







Allelic richness 






Alleles per locus 




All 


Coding 


Non coding 


All 


Coding 


Non coding 


Total 


1.79 ±041 


1.74 ±0.39 


1 .70 ± 0.42 


1.91 


1.87 


1.79 


Cherry 


1 .87 ± 0.42 


1.83 ±0.38 


1 .73 ± 0.45 


1.87 


1.83 


1.73 


Fresh 


1 .67 ± 0.49 


1.61 ±0.48 


1.56 ±0.50 


1.67 


1.62 


1.56 


Landraces 


1 .52 ± 0.46 


1 .38 ± 0.42 


1 .42 ± 0.43 


1.61 


1.48 


1.52 


Processing 


1 .69 ± 0.47 


1 .63 ± 0.44 


1 .54 ± 0.45 


1.75 


1.70 


1.63 



analysis implemented in the software Structure [31]. 
Three data sets were independently used: the genotyping 
results with 175 SNPs, 127 non-coding SNPs or 48 co- 
ding SNPs. For the whole set of markers, both the 
Evanno's test and the non-parametric Kruskal-Wallis 
analysis indicated that the most informative number of 
subpopulation (K) was 7 (Additional file 7: Figure Sla 
and Sib). The Structure analysis provided data for a bio- 
logical interpretation of the sub-population structure 
based on the origin and market classes of the contem- 
porary varieties. For this reason, clusters were named ac- 
cording to the a priori group of varieties with the largest 
membership coefficient. The inferred population struc- 
ture is presented in Figure 2A and membership coeffi- 
cients in Additional file 8: Table S7. Landraces grouped 
together (olive green and light blue). The only exception 
was the 'Spongillo' accession, characterized by small 
pointed red fruits, which was assigned to the group of 
contemporary cherry tomatoes. This variety was recently 
collected from a local farmer and its origin is unknown. 
The second cluster mainly represents plants with an ox- 
heart (heart-shaped) shaped fruit, such as the 'Sorrento' 
and 'Cuor di Bue' types. The contemporary varieties 
were distributed across more than one group and a dis- 
tinction could be made among processing, fresh market 
and cherry tomatoes. The processing varieties were 
present in one cluster (orange). For fresh market toma- 
toes a large number of plants appeared to have ancestry 
in more than one of Structure clusters. Two Structure 
groups were specific for this market class (dark blue and 
azure). Admixture with the landrace cluster was evident 
for cultivars with oxheart fruits (i.e.: 'RhodiaJ 'Goldmar^ 
'PS18 3 2693','Gotico;'Margot'), that often displayed the 
higher membership coefficient in the group of oxheart 
landraces. More than half of the cherry varieties showed 
the highest membership coefficient for a specific cluster 
(dark red), while the remaining showed admixture, pri- 
marily with the processing varieties. Finally, as expected, 
the tomato's wild relatives constituted a well separated 
cluster (purple). The Structure analysis also indicated 
that among the wild species tested, S. pimpinellifolium 
has the higher admixture with cherry tomatoes [32]. 



For the non coding SNPs, both the Evanno's test and 
the Kruskal Wallis analysis of the log-likelihood variance 
indicated that the most informative K was 7 (Additional 
file 7: Figure Sic and Sid). Population structure analysis 
defined clusters that were associated to a priori tomato 
type-based groups (Figure 2B). Landraces were divided 
into two well-defined clusters. Among the contemporary 
varieties, processing varieties assorted together. Non co- 
ding SNPs evidenced the highest level of admixture for 
the fresh-market tomatoes. Furthermore, the admixture 
of the cherry varieties with the processing group was 
more evident. Finally, wild tomatoes grouped separately. 

For the coding SNP, the second order rate of change 
of the likelihood function with respect to K (AK) did not 
show any clear peak at the values tested. The Kruskal- 
Wallis analysis indicated that the minimum K-value that 
produced higher likelihood solutions (P < 0.01) was 10, 
while subsequent K-values had statistically similar solu- 
tions (P = 0.492). When the log likelihood score reached 
a plateau, there was an asymmetric distribution of geno- 
types and some individuals were strongly assigned to 
populations, corroborating the presence of a real popula- 
tion structure [31]. At a K-value of 10, a biological inter- 
pretation of the assignment was evident (Figure 2C). A 
division of the genotypes according to the different to- 
mato types was consistent with the previous analysis, 
but coding SNPs identified further subdivisions. The land- 
races were partitioned into three sub-groups (orange, pur- 
ple and liliac). Although plants with different fruit shape 
were present in each of these groups, approximately half 
of the plants of the orange group were characterized by 
having small round/plum fruits. Similarly, the purple 
group was mostly characterized by plants with cylindrical, 
elongated 'San Marzano' type fruits, and the liliac group 
by plants with oxheart fruits. Processing varieties were di- 
vided in two well separated clusters (green and azure). 
The fresh market varieties were assigned to different clus- 
ters. The majority of the varieties were present in two 
groups (pink and blue). The others displayed a high mem- 
bership coefficient with a landraces subpopulation (4 
genotype with oxheart fruits; purple) and processing var- 
ieties (azure). Two were specific for this market class (pink 
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Figure 2 Estimated population structure of the tomato genotypes. Each genotype is represented by a horizontal line, which is partitioned 
into colored segments that represent the estimated membership fractions in the K clusters. A) Population structure inferred using the whole SNP 
dataset for K= 7. B) Population structure inferred using the non coding SNPs for K= 7. C) Population structure inferred using the coding SNPs for 
K= 10. See Additional file 7: Figure SI for the determination of the most informative K value. 



and blue) however, the other accessions had the higher 
membership coefficient in the landraces' group (4 geno- 
types with oxheart fruits; pink) and processing varieties 
(azure). Cherry varieties were mostly grouped into two 
clusters (dark red and light blue) while others displayed a 
high level of admixture. Wild tomatoes assorted together 
(light green). 

We tested whether the groups inferred by the po- 
pulation structure analysis, or those defined a priori, 



represent statistically significant subpopulations by pair- 
wise comparison of two measures of differentiation; Fst 
and Nei' standard genetic distance (Dst). The results of 
the two indices (Table 2) were not correlated (P > 0.05, 
Spearman's rho test). As expected, higher genetic dis- 
tances and Fst values were found for the comparison be- 
tween cultivated material and wild species. The degree 
of gene differentiation among pre-defined S. lycopersi- 
cum groups in terms of allele frequencies indicated that 
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Table 2 Estimation of genetic differentiation and distance 



Predefined groups 


Cherry 


Fresh 




Landraces 




Processing 


Wild 


Cherry 




0.16** 




0.29** 




0.12** 


0.46** 


Fresh 


0.04 






0.11** 




0.11** 


0.60 ** 


Landraces 


0.07 


0.03 








0.25** 


0.69 ** 


Processing 


0.03 


0.02 




0.06 






0.58 ** 


Wild 


0.25 


0.34 




0.39 




0.32 




Structure Groups 


CI 


F1 


F2 


L1 


L2 


P1 


W1 


C1 




0.24** 


0.24** 


0.37** 


0.33** 


0.17** 


049** 


F1 


0.07 




0.11** 


0.19** 


0.17** 


0.12** 


0.62** 


F2 


0.06 


0.03 




0.24** 


0.18** 


0.16** 


0.59** 


LI 


0.10 


0.04 


0.05 




0.07** 


0.28** 


0.69** 


L2 


0.10 


0.03 


0.05 


0.01 




0.21** 


0.69** 


P1 


0.04 


0.03 


0.04 


0.07 


0.05 




0.57** 


W1 


0.26 


0.36 


0.33 


0.38 


0.39 


0.31 





Pairwise estimates of Fst and Nei's standard genetic distance (Dst) between predefined groups or between groups of tomato accessions as inferred by the 
Bayesian analysis implemented in the Structure software. 

Above the diagonal is the pairwise estimate of Fst, while Dst appears below the diagonal. Global Fst was 0.1 9 (P < 0.01 ) within the four tomato groups and 0.26 
(P < 0.01} within the Structure groups. The P value for the estimated Fst was calculated using 10,000 permutations (**: P < 0.01}. 



landraces represent the most distinct subpopulation com- 
pared to each of the contemporary groups considered, 
as also indicated by the higher Dst values. The three 
predefined groups of commercial varieties were also 
significantly different. A minimum genetic distance was 
determined between fresh market and processing varieties. 
Both coding and non-coding SNPs were able to support 
these conclusions (Additional file 9: Table S8) and pro- 
vided higher values of genetic differentiation and distance. 
The Structure grouping indicated the presence of a greater 
subdivision for the landraces and the fresh market groups. 
These subdivisions were supported by the Fst and Dst 
values and for each of the two tomato classes the intra- 
group differentiation and genetic distance were lower 
when compared to the inter-groups values (Table 2). 

Coding and non-coding SNPs yielded different sub- 
population structures. The analysis of genetic differenti- 
ation supported the divisions defined by non-coding 
SNPs (Additional file 9: Table S8). The additional subdi- 
visions yielded by the coding SNPs were not always sta- 
tistically supported (Additional file 9: Table S8). The 
subdivision of landraces into three clusters was signifi- 
cant, as well as the subdivisions of the processing and of 
the cherry varieties. The analysis of population structure 
indicated that fresh market tomatoes could be assigned 
to four groups. However, within them, three subgroups 
were not statistically different considering the bootstrap 
analysis of the Fst values. These three groups showed a 
statistically low or a lack of differentiation also with the 
wild species, despite that their genetic distance was simi- 
lar to that of the other pairwise comparisons involving 
wild species and 5. lycopersicum varieties. This suggests 
that the small sample size of the fresh market groups 



identified by the analysis of population structure may 
contribute to the lack of a significant genetic differenti- 
ation. Finally, pairwise genetic distance and Fst indicated 
the lack of a significant difference between one fresh 
market and one processing Structure's group. 

Loci under selection 

Locus specific estimates of Fst were calculated to iden- 
tify genomic regions that have been the target of selec- 
tion. Wild species were not included in this analysis. A 
locus-by-locus pairwise Fst comparison between the 
different tomato classes indicated the presence of sub- 
stantial variation among loci (Figure 3). A variable per- 
centage of loci, from 4% (cherry vs landraces) to 19% 
(cherry vs processing) had a negative Fst, reflecting the 
fact that for these SNPs more variance exists within than 
across subpopulations. For all comparisons the highest 
percentage of loci (on average 31%) had Fst values ran- 
ging from 0 to 0.05, implying limited variation of allele 
frequencies between subpopulations. Large differences 
among pairwise comparisons were found in the number 
of loci with very high Fst values (>0.5), whose percentage 
ranged from 18% (cherry vs landraces) to 0 for the fresh 
vs processing comparison. The percentage of loci that 
are above the 95% or 99% upper confidence intervals 
varied little (Additional file 10: Table S9) and implied 
the presence of outliers in all pairwise comparisons. To 
statistically identify candidates for loci under selection 
between landraces and the market classes of commercial 
varieties we carried out an analysis based on the de- 
tection SNPs that had excessively high or low Fst com- 
pared to neutral expectations (Figure 4). This method 
identified 37 SNPs falling outside the 95% confidence 
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i Cherry vs Fresh 
i Cherry vs Landraces 
[ Cherry vs Processing 
i Fresh vs Landraces 
i Fresh vs Processing 
Landraces vs Processing 




<0.05 <0.1 <0.15 <0.2 <0.25 <0.3 <0.35 <0.4 <0.45 <0.5 <=1 

Fst 

Figure 3 Distribution of pairwise Fst values among the four cultivated tomato groups. The percentage of loci for which Fst could not be 
determined was 15% for C vs F, 15% for C vs L, 13% for C vs P, 28% for F vs L, 24% for F vs P and 22% for L vs P (C: cherry, F: fresh market; 
L: landraces; P: processing). 



boundaries for the landraces vs processing comparison, 
34 for the landraces vs fresh market and 30 for the land- 
races vs cherry (Additional file 11: Table S10). Further- 
more, the proportion of coding and non-coding loci 
under selection was not significantly different from their 
distribution in the entire dataset (P > 0.05; Pearson's chi- 
squared test). Figure 5 illustrates the number of loci that 
were common or specific in the comparison between 
landraces varieties and each of the three market classes 
of contemporary cultivars. Overall, a high proportion of 
these loci are localized into chromosome 11. Five loci 
were common among the different comparisons and 
Table 3 reports their main genetic features. Their func- 
tions are consistent with a role in adaptation, as these 
genes are involved in processes that are vital for plant 
growth and survival under stressful environmental con- 
ditions. It is expected that the majority of the identified 
loci indicate genomic regions that have been differenti- 
ated during selection and breeding, although their de- 
tection does not provide indication of the direction of 
causality. Interestingly, in various cases we found associ- 
ation between two consecutive SNPs of our panel. For 
instance, the SL10019_376 and the SL10450_71, both 
localized in chromosome 3, were identified as being 



putatively under selection in the comparisons between 
landraces and each of the three classes of contemporary 
varieties. Furthermore, six consecutive SNPs (SL10240_ 
154, SL20173_496, SL20027_428, SL20181_382, SL10715_ 
489 and SGN-U312814_254) of chromosome 11 were 
identified as being under selection between the cherry 
group and landraces. 

Discussions 

Our aim was to investigate population structure and 
genetic differentiation within the cultivated tomato germ- 
plasm and to identify loci that can putatively account for 
the observed differences. Understanding genetic resources 
is an important step in order to exploit traits such as nu- 
tritional and quality value from cultivated material, espe- 
cially if it is well adapted to local environments or has not 
been exposed to modern breeding [25] . 

Genetic diversity for each of the predefined sub- 
population was measured using allelic richness, expected 
heterozygosity, and polymorphic information content. 
Significant differences among cultivated tomatoes were 
present considering the allelic richness per locus with 
the only exception being the fresh vs processing com- 
parison. Landraces have lower allelic richness, a higher 
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number of rare alleles and a lower number of private al- 
leles when compared to contemporary cultivars. Thus, 
the data suggests that a good portion of the genetic di- 
versity and specific adaptation of the investigated Italian 
landraces was captured in the founder lines of the con- 
temporary varieties. However, it should be noted that 
the SNPs employed were selected as polymorphic in 
contemporary varieties and therefore their use may not 



be ideal to detect private polymorphisms or rare alleles 
potentially involved in directional selection of landraces 
[33]. It is also likely that the very low number of pri- 
vate alleles also reflects the fact that different fruit 
shapes and plant habits are represented in our land- 
races collection. 

The model-based clustering method for inferring 
population structure indicated that landraces constitute 
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Figure 5 Number of loci under positive selection between the 
landraces and the different market classes. The intersecting 
portions of the Venn diagram illustrate the number of common loci 
among the different comparisons. 



distinct subpopulations compared to contemporary va- 
rieties. This result was evident when considering both 
non-coding and coding SNPs. Furthermore, our study 
confirmed that contemporary varieties can be divided 
into populations that reflect different market classes. All 
these findings were supported by an analysis of genetic 
differentiation, which indicated a significant distinction 
between all tomato types. Our results are consistent with 
previous studies, which proposed that the genetic differ- 
entiation between processing and fresh market varieties 
mainly reflects breeding for ideotypes related to distinct 
production systems [22]. In that work, the two process- 
ing sub-groups were associated with breeding history in 
the USA, while sub populations were not discernible in 



fresh market cultivars and in vintage varieties. We found 
a subdivision of contemporary cultivars that is associated 
also to different fruit shapes, as these varieties were se- 
parated in three classes (fresh, processing and cherry) 
rather homogeneous in respect to fruit morphology 
(round, elongated and cherry, respectively) [34]. Further- 
more using the entire SNPs dataset, we did not detect 
further subdivisions in the processing tomatoes, while 
the fresh market varieties were assigned to different 
groups. Irrespective of the type of SNPs employed, fresh 
market varieties showed the highest degree of population 
structure, which is coherent with the more competitive 
breeding activity and diversification of this market class 
when compared to processing tomatoes [4]. The data 
also provided evidence for subpopulation structure be- 
tween cultivated cherry and wild species. Although an- 
ticipated [7,11,34], a differentiation between cultivated 
cherry and wild cherry (or landraces) has not always 
been found [6]. The cherry group showed the highest 
level of admixture, most likely because several varieties 
that were assigned to this group lack a clear separation 
between processing and cherry. For instance, cultivars 
such as 'Tomitol 'KikkoJ 'Birbaj 'Mascalzone' etc. are im- 
proved and sold by breeding companies for both pro- 
cessing and fresh market. Such an explanation is also 
corroborated by the fact that the high number of loci 
with negative Fst was present in the cherry vs processing 
comparison. Overall, considering also the allelic richness 
and the number of private alleles of the cherry the data 
indicated that this market class has the highest genetic 
variation [23,33]. Our data are consistent with a diverse 
breeding foundation for the cherry market class. 

Selection for fruit shape is considered an important 
factor responsible for genetic structure in tomato culti- 
vars [35]. It is therefore interesting that the landraces' 
subpopulations include a range of fruit shapes (e.g.: 
elongated, cherry, round, ox heart etc.). Different from 
other studies, the analysis provided evidence for subpop- 
ulations within landraces. A distinction, which was based 
on fruit shape, was possible for the oxheart type acces- 
sions using both coding and non-coding markers. 



Table 3 Candidate loci under positive selection that were common among the pairwise comparisons between 
cultivated landraces and the different market classes of contemporary varieties 



Marker 



Chrom. Exon/intron Gene name* 



Description* 



Expected heterozigosity-Fst 
vs processing vs fresh market vs cherry 



SL10450_71 


3 


Exon 


Solyc03g1 14120.2 


Ribonuclease 111 


0.72- 


-0.70 


0.14- 


-0.13 


0.57- 


-0.24 


SL10019_376 


3 


Intron 


Solyc03g 113990.2 


Uncharacterized conserved protein 


0.67- 


-0.66 


0.18- 


-0.13 


0.56- 


-0.20 


SL20017_699 


5 


Intron 


Solyc05g050900.2 


Spindle and kinetochore-associated 
protein 1 homolog 


0.53- 


-0.50 


0.47- 


-0.44 


0.52- 


-0.29 


SGN-U313292_417 


11 


Exon 


Solycl 1g072190.1 


Elongation factor beta-1 


0.29- 


-0.29 


0.29- 


-0.29 


0.40- 


-0.39 


SGN-U312814_254 


11 


Exon 


Solycl 1g069430.1 


Aquaporin 1 


0.55- 


-0.40 


0.54- 


-0.32 


0.62- 


-0.49 



*Gene name and description were retrieved from the Solgenomics network (http://solgenomics.net/organism/Solanum_lycopersicum/genome). 
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Overall, our data indicate that the tomato landraces 
differ from contemporary varieties as the former bears a 
higher number of minor-alleles (and related allele fre- 
quencies) and a stronger population structure, as indi- 
cated by the membership coefficient. These features are 
usually explained considering a strong divergent or dir- 
ectional selection operating on many traits during adap- 
tation to local conditions and practices. Most plant 
populations are expected to exhibit significant adapta- 
tion, especially in the presence of recurrent selection for 
optimal performance in specific environments [36,37]. 
Alternatively, the genetic features of the landraces could 
be also justified considering the recent tomato history. 
Breeding of the different market classes has been driven 
by the common needs of the introgression of traits from 
wild-species and of lowering the cost of the mechanical 
practices. However, in this scenario it would be difficult 
to introduce the population structure of the contempo- 
rary varieties that we and others have reported. 

We also compared coding and non-coding SNPs. We 
did not observe large differences in the polymorphism as 
measured by allelic richness or alleles per locus. As ex- 
pected, landraces displayed a greater polymorphism in 
non-coding markers [38,39] yet contemporary varieties 
had a greater diversity in coding SNPs. Although all the 
intronic regions are not necessarily selectively neutral, 
this may reflect the fact that polymorphism in contem- 
porary varieties essentially derives from breeding efforts. 
While the analysis of the frequency of minor alleles indi- 
cated that selection and adaptation may have changed 
the frequency of predominant alleles in landraces, the 
data also suggest that contemporary breeding has in- 
creased allelic diversity relative to traditional landraces, 
especially in coding regions. This hypothesis should be 
tested by analyzing haplotype structures. 

Differences in the ability of markers to discriminate 
and assign individuals to a subpopulation were not ob- 
served for the a priori tomato groups. Irrespective of the 
type of marker employed, a distinction between land- 
races and contemporary varieties was well supported. 
Differences in the number of the optimal number of 
clusters were present considering the population struc- 
ture analysis. Coding SNPs distinguished more subpopu- 
lations, although not all the groups were different in 
terms of genetic differentiation. However, the data also 
suggested that the small number of genotypes in those 
groups could contribute to the lack of statistically signifi- 
cant differences. The data indicated that the location of 
the polymorphism within a gene affects the performance 
for population analysis in the tomato. Although the cod- 
ing and non coding markers represent different loci, it is 
reasonably to speculate that the further subdivisions we 
have observed reflects the fact that genome scans based 
on coding markers are more likely to detect molecular 



adaptation linked to genes, although this holds true es- 
pecially for species with a rapid Linkage Disequilibrium 
(LD) decay. 

The identification of loci that have undergone positive 
selection is a fundamental step in understanding how 
populations have adapted to specific environments and 
agronomic practices. Such studies are increasingly wide- 
spread [40,41] and can also provide insights on the 
history of the plant species under investigation. Consid- 
ering that tomato has experienced severe genetic bottle- 
necks it is difficult to distinguish selective sweeps from 
the effects of genetic drift due to the bottlenecks them- 
selves. We used an Fst-based statistic to assess if the 
variation of SNP allele frequencies among populations 
can identify signatures of selection [41,42]. If Fst is de- 
termined only by genetic drift, the vast majority of the 
loci should be affected in a similar way [43]. However, 
we observed the presence of a locus-specific selection 
pressure in different loci and, in various cases, in linked 
genetic markers. For instance, the comparison between 
landraces and cherry tomatoes indicated that some ex- 
tended chromosomal regions may be under diversifying 
selection relative to other regions of the genome. Fur- 
thermore, considering the number and location of puta- 
tive loci under selection in studies that mainly compared 
commercial cultivars [6,22], the data indicated that var- 
ious specific regions may differentiate landraces from 
contemporary varieties. 

Although the majority of the loci had a low Fst value 
in pairwise comparisons, our data showed the presence 
of genomic regions with high genetic variation between 
sub-populations. The loci we have identified are of po- 
tential interest for plant breeders as they likely contri- 
bute to the existing differences between contemporary 
and local varieties. Considering the LD of the tomato 
[23], one obstacle is to distinguish genes that are associ- 
ated from the selected genes themselves. On the other 
hand, the identification and exclusion of loci under se- 
lection is necessary to avoid biased estimates of other 
genetic parameters such as demographic factors and his- 
torical bottlenecks. It is interesting that our results 
showed that it is possible to efficiently detect a geo- 
graphical specificity in tomato. Thus, our data imply that 
it is conceivable to identify markers useful to infer gen- 
etic ancestry in cultivated tomato by selecting loci with 
the highest Fst values and with the ability to yield the 
largest coefficient of membership for the predefined 
groups [44]. The loci that can effectively capture vari- 
ation within populations of interest facilitate candidate 
gene and fine-structure association studies by allowing 
for efficient control of population stratification [45]. Be- 
sides, their selection is important to identify individuals 
with greater amounts of admixture so that they can be 
removed from the breeding pool [46]. 
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Conclusions 

Our data indicate that selection and adaptation led to 
specific patterns of genetic variation in the cultivated to- 
mato germplasm. To date, genomic evidence for the spe- 
cificity of cultivated tomato landraces has been largely 
inferred from a limited number of samples or markers. 
The observed genetic differentiation within contempor- 
ary market classes should reflect division into alternative 
breeding programmes, selection for specific traits (e.g.: 
fruit shape) and their combinations. Finally, the data in- 
dicate that landraces may carry an extended footprint at 
the genomic level, which deserves further investigation. 
The disappearance of local varieties represents another 
cause of reduction of tomato diversity [14,15,47] and this 
study provides evidence to encourage a long-term effort 
for the characterization and exploitation of cultivated to- 
mato landrace. 

Methods 

Plant material and DNA isolation 

The germplasm of cultivated (Solatium lycopersicum) 
and wild tomatoes used in this study is listed in Additional 
file 1: Table SI. We analysed 214 genotypes which in- 
cluded 30 cherry, 37 fresh-market, 76 landraces and 65 
processing accessions of S. lycopersicum, along with six 
wild species. Landraces (also called heritage) tomatoes 
represent cultivated, open-pollinated accessions that in- 
clude farmers' selections and traditional types. Although 
the exact historical origin of this material is not always 
known, our landraces can be considered as regional acces- 
sions that originated in Italy and whose diversity has been 
maintained by local farmers. Processing, fresh market and 
small fruit/cherry varieties represent a selection of com- 
mercially relevant cultivars. The classification in different 
market-classes reflects that of the tomato seed companies. 
'Microtoml a variety developed for ornamental purposes 
[48], was included in the cherry group. The analyzed col- 
lection included Heinz 1706 and LA 1589, whose ge- 
nomes have recendy been sequenced. DNA isolation was 
carried out on young true leaves, according to previously 
reported procedures [49]. 

Genotyping 

We used the Illumina Golden-Gate assay for large-scale 
SNP validation, utilizing a customized design based on the 
384-format Genotyping Assay. The SNPs' set comprised 
polymorphisms, distributed throughout the genome, se- 
lected from literature [6,50] and the SOL Genomics Net- 
work (http://solgenomics.net). Briefly, the sequence of 
each selected locus, including the polymorphic nucleotide 
and a 60-bp flanking sequence, was submitted to the Illu- 
mina Assay Design Tool (Illumina). The GoldenGate assay 
was arrayed on the BeadXpress Reader (an automated flu- 
idics and multi-laser imaging device platform) using the 



VeraCode technology (Illumina). The labeled allele-spe- 
cific PCR products were hybridized to the VeraCode 
beads, each bearing a locus-specific barcode via the corre- 
sponding Illumicode sequence. A supervised allele calling 
for each locus was accomplished based on the data ge- 
nerated by the GenomeStudio Data Analysis software 
(Illumina). We tested 192 SNPs. Fifteen were removed 
from the genetic analysis because of the percentage of 
missing data points (> 5%). The genotyping with the Illu- 
mina GoldenGate platform was carried out at the Parco 
Tecnologico Padano (http://www.tecnoparco.org/). 

Classification of markers 

To determine the physical positions of the SNP markers 
used in this study, the sequences used to develop these 
SNPs were Blasted (BlastN) against the tomato genome. 
Only the top hits with an e-value < le-10 were consi- 
dered. Information on the location of the SNPs and 
their gene feature details are presented in the Additional 
file 12: Table Sll. Using the available genome annotations 
(S12.40), we categorized the SNPs in "coding" (i.e.: those 
located in exonic regions) and "non coding" (i.e.: those lo- 
cated in introns as well as intergenic regions). Location in 
gene models was identified using the SGN genome brow- 
ser (ITAG2.3 genomic annotation). 

Data analysis 

Gene diversity, Polymorphic information Content (PIC), 
allele frequencies and allelic richness were calculated as 
already described [51-53] using the PowerMarker [54] 
and the MSA [55] software. Population differentiation 
tests and related statistics were carried by PowerMarker 
as previously reported [56]. Possible population struc- 
ture was estimated using a model-based Bayesan proced- 
ure implemented in the software Structure v2.3 [31] and 
Structure Harvester [57]. The analysis was carried out 
using a burning period of 25,000 iterations and a run 
length of 500,000 MCMC replications. "We tested a con- 
tinuous series of Ks, from 1 to 12, in ten independent 
runs. We did not introduce prior knowledge about the 
population of origin and assumed correlated allele fre- 
quencies and admixture [58]. The most informative K 
was identified using the ad hoc statistic AK, which is 
based on the rate of change in the log probability of data 
between successive K values [59] and the analysis of 
variance of the log likelihood values using the non- 
parametric Kruskal-Wallis test [60] (SPSS Statistics 20; 
IBM). The estimated cluster membership coefficient 
matrices of the ten runs were permuted so that all repli- 
cates have the closest match possible and then averaged 
across replicates using the Greedy algorithm of the soft- 
ware CLUMMP [61]. To validate the predefined or the 
estimated population structure, we calculated pairwise 
Fst and Nei's standard genetic distance (Dst) between 
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populations [51,62] using MSA [55]. The reference dis- 
tribution for P-value calculation of the Fst analysis was 
based on 10,000 permutations. We identified loci under 
positive selection between pre-defined populations of 
cultivated tomato using an Fst-outlier detection method 
[42] implemented in the software Lositan [63]. We ran 
100.000 iterations, using a 0.95 confidence interval and 
an infinite allele model. Loci that deviate from the ex- 
pected distribution of neutral markers were identified on 
the basis of excessively high or low Fst. 

Additional files 
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